Supplementary Material AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture
نویسندگان
چکیده
In all our experiments, at training time we pick top 2000 RPN generated 3D proposals using NMS (non-maximum suppression). At test time we select top 1000 3D proposals. However, a lower number of proposals, e.g. top 300 proposals does not effect the detection performance, and increase the test time detection speed significantly. In Section 3.2, we show that extracting less number of 3D proposals (at test time) does not effect the detection performance. Shaoqing et al. [8] observed the same with Faster-RCNN.
منابع مشابه
Decomposing Motion and Content for Natural Video Sequence Prediction
We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independent...
متن کاملDeep Auxiliary Learning for Visual Localization and Odometry
Localization is an indispensable component of a robot’s autonomy stack that enables it to determine where it is in the environment, essentially making it a precursor for any action execution or planning. Although convolutional neural networks have shown promising results for visual localization, they are still grossly outperformed by state-of-the-art local feature-based techniques. In this work...
متن کاملFully-trainable deep matching
Deep Matching (DM) is a popular high-quality method for quasi-dense image matching. Despite its name, however, the original DM formulation does not yield a deep neural network that can be trained end-to-end via backpropagation. In this paper, we remove this limitation by rewriting the complete DM algorithm as a convolutional neural network. This results in a novel deep architecture for image ma...
متن کاملAn End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos
Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has not been that significant due to complexity of video data and lack of annotations. In addition, training deep neural networks on large scale video datasets is extremely computationally e...
متن کاملDeep Stereo Matching with Explicit Cost Aggregation Sub-Architecture
Deep neural networks have shown excellent performance for stereo matching. Many efforts focus on the feature extraction and similarity measurement of the matching cost computation step while less attention is paid on cost aggregation which is crucial for stereo matching. In this paper, we present a learning-based cost aggregation method for stereo matching by a novel sub-architecture in the end...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017